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The invention is described in the following statement: 



EMPHASIS OF SHORT-DURATION TRANSIENT SPEECH FEATURES 
Field of the Invention 

This invention relates to the processing of signals derived from sound 
stimuli, particularly for the generation of stimuli in auditory prostheses, such as 
5 cochlear implants and hearing aids, and in other systems requiring sound processing 
or encoding. 

Background of the Invention 

Various speech processing strategies have been developed for processing 
sound signals for use in stimulating auditory prostheses, such as cochlear prostheses 
10 and hearing aids. Such strategies focus on particular aspects of speech, such as 
formants. Other strategies rely on more general channelization and amplitude 
related selection, such as the Spectral Maxima Sound Processor (SMSP), strategy 
which is described in greater detail in Australian Patent No. 657959 by the present 
applicant. 

15 A recurring difficulty with all such sound processing systems is the provision 

of adequate information to the user to enable optimal perception of speech in the 
sound stimulus. 

Summary of the Invention and Object 

It is an object of the present invention to provide a sound processing strategy 
20 to assist in perception of low-intensity short-duration speech features in the sound 
stimuli. 

The invention provides a sound processing device having means for 
estimating the amplitude envelope of a sound signal in a plurality of spaced 
frequency channels, means for analyzing the estimated amplitude envelopes over 
time so as to detect short-duration amplitude transitions in said envelopes, means 
for increasing the relative amplitude of said short-duration amplitude transitions for 
the duration of said transitions, including means for determining the rate of change 
of said short-duration amplitude transitions for the whole duration of the transitions, 
and using this rate of change to determine the size of the increase in relative 
amplitude applied to said transitions. 
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In a preferred form, the means for determining the rate of change of said 
short-duration amplitude transitions determines a rate of change profile over time 
for each transition. 

In a particularly preferred form, the faster/greater the rate of change of said 
5 short-duration amplitude transitions, the greater the increase to relative amplitude 
applied to said transitions. Furthermore, rate of change profiles corresponding to 
short-duration burst transitions receive a greater increase in relative amplitude than 
do profiles corresponding to onset transitions. In the present specification, a "burst 
transition" is understood to be a rapid increase followed by a rapid decrease in the 

10 amplitude envelope, while an "onset transition" is understood to be a rapid increase 
followed by a relatively constant level in the amplitude envelope. 

The above defined Transient Emphasis strategy has been designed in 
particular to assist perception of low-intensity short-duration speech features for the 
severe-to-profound hearing impaired or Cochlear implantees. These speech features 

15 typically consist of: i) low-intensity short-duration noise bursts/frication energy that 
accompany plosive consonants; ii) rapid transitions in frequency of speech formants 
(in particular the 2nd formant, F2) such as those that accompany articulation of 
plosive, nasal and other consonants. Improved perception of these features has been 
found to aid perception of some consonants (namely plosives and nasals) as well as 

20 over all speech perception when presented in competing background noise. 

The Transient Emphasis strategy is preferably applied as a front-end process 
to other speech processing systems, particularly but not exclusively, for stimulating 
implanted electrode arrays. The currently preferred embodiment of the invention is 
incorporated into the Spectral Maxima Sound Processor (SMSP) strategy, as 

25 referred to above. The combined strategy known as the Transient Emphasis 
Spectral Maxima (TESM) Sound Processor utilises the transient emphasis strategy 
to emphasise the SMSP's filter bank outputs prior to selection of the channels with 
the largest amplitudes. 

As with most multi-channel speech processing systems, the input sound 

30 signal is divided up into a multitude of frequency channels by using a bank of band- 
pass filters. The signal envelope is then derived by rectifying and low-pass filtering 
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the signal in these bands. Emphasis of short-duration transitions in the envelope 
signal for each channel is then carried out. This is done by: i) detection of short- 
duration (approximately 10 to 60 milliseconds) amplitude variations in the channel 
envelope typically corresponding to speech features such as noise bursts, formant 
5 transitions, and voice onset; and ii) increasing the signal gain during these periods. 
The gain is adjusted in proportion to the derivative with respect to time (gradient) of 
the signal envelope (or some similar rule, as described below in the Detailed 
Description section). 

During periods of steady state or relatively slow varying levels in the 

10 envelope signal (over a period of approximately 60ms) no gain is applied. During 
periods where short-duration transition in the envelope signal are detected, the 
amount of gain applied can typically vary from OdB to 12dB. The gain varies 
depending of the nature of the short-duration transition which can be classified as 
either of the following, i) A rapid increase followed by a decrease in the signal 

15 envelope (over a period of no longer than approximately 60ms). This typically 
corresponding to speech features such as the noise-burst in plosive consonant or the 
rapid frequency shift of a formant in a consonant-to-vowel or vowel-to-consonant 
transition, ii) A rapid increase followed by relatively constant level in the signal 
envelope which typically corresponds to speech features such as the onset of 

20 voicin g in a vowel. Short duration speech features classified according to i) are 
considered to be more important to perception than those classified according to ii) 
and thus receive relatively twice as much gain. Note, a relatively constant level 
followed by a rapid decrease in the signal envelope which corresponds to abruption 
of voicing/sound does not receive any gain. 

25 Brief Description of t he Drawings 

In order that the invention may be more readily understood, one presently 
preferred embodiment of the invention will now be described with reference to the 
accompanying drawings in which: 

Figure 1 is a schematic representation of the signal processing applied to the 

30 sound signal in accordance with the present invention, and 
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Figures 2 and 3 are comparative electrodograms of sound signals to show the 
effect of the invention. 
Description of Preferred Embodiment 

Referring to Figure 1 , the presently preferred embodiment of the invention is 
5 described with reference to its use with the SMSP strategy. As with the SMSP 
strategy, electrical signals corresponding to sound signals received via a 
microphone 1 and pre-amplifier 2 are processed by a bank of N parallel filters 3 
tuned to adjacent frequencies (typically N = 16). Each filter channel includes a 
band-pass filter 4, then a rectifier 5 and low-pass filter 6 to provide an estimate of 
10 the signal amplitude (envelope) in each channel. In this embodiment a Fast Fourier 
Transform (FFT) implementation of the filter bank is employed. The outputs of the 
N-channel filter bank are modified by the transient emphasis algorithm 7 (as 
described below) prior to further processing by the SMSP strategy. 

The envelope signal in each of the N channels (denoted S n where the 
15 subscript n refers to the channel number) is further low-pass filtered 8 (denoted E n ) 
so as to attenuate any frequency components above approximately 50-100Hz (such 
as the voicing frequency). This low-pass filter (2nd order low-pass cut-off freq of 
25Hz) introduces a group delay (T) to the signal (typically T = 10 ms). A running 
history, which spans a time period of 6 x T (typically 60 ms), of the original 
20 envelope signal (denoted S n (t) as a function of time) and the low-pass filtered 
envelope signal (E n (t) as a function of time) is maintained 9. Time (t) is relative to 
the original envelope signal S n (t). An emphasis gain coefficient (denoted G n ) is 
then calculated 10 for each channel from the low-pass filtered envelope signal 
according to equation (1). 
25 Equation. 1 . G n = (2x E„(t,) - 2 x E n (t 2 ) - E n (t 0 ) ) / ( E n (t 0 ) + E n (tj) + E n (t 2 ) ) 
Whereto = -T, t, = -3 xT, t 2 = -5 xT 
(typically t 0 = -10ms, t! = -30ms, t 2 = -50ms) 
The amount of gain (G n ) applied (as per equation (1)) thus varies with the 
behavior of E n (t) such that a rapid increases followed by a rapid decrease (over a 
30 time period of no longer than approximately 6 x T) in the envelope signal (E n (t)) 
will produce the greatest values of G n . Typically for speech-like signals, G n can be 
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expected to range from 0.0 to 2.0. A rapid increase followed by a relatively 
constant level in the envelope signal will produce lower levels of G n (approximately 
half that of the previous condition). A relatively steady-state or slow varying 
envelope signal will produce a negative value of G n . So too will a steady-state level 

5 followed by a rapid decrease in the envelope signal. The emphasis gain coefficient 
is therefore limited 1 1 such that it can never fall below zero as per equation (2). 
Also an upper gain limit (L n ) is included to restrict the value G n in cases where 
short-duration transients in the envelope signal exceed the dynamics of speech-like 
signals (such as the slamming of a door, or banging of a drum). L n can be specified 

10 independently for each frequency channel (typically L n = 2 for all n). 
Equation. 2. 0 < G n < L n 

That is, If (G n > L„) then G„ = L n 
If(G n <0)then G n = 0; 

Where L n = emphasis gain limit (typically L n = 2) 
15 It should be noted that whilst equations (1) and (2) define the rules used for 

calculation of the emphasis gain coefficient in this embodiment of the algorithm, 
other gradient type rules may also be applicable. 

Each emphasis gain coefficient (G n ) is then used to scale 13 the original 
envelope signal (S n (t)) according to equation (3). An emphasis gain modifier 
20 constant (K n ) has been included 1 2 to allow for adjustment of the overall gain of the 
algorithm. K n can be specified independently for each channel so that the amount 
of gain applied can be adjusted for each frequency channels (typically K n = 2 for all 
n). During periods of no short-duration transitions in the envelope signal, the 
emphasis gain coefficient (G n ) is equal to zero and thus S' n (t,) = S n (ti). During 
25 periods when short-duration transients in the envelope signal take place, G n will be 
greater than zero and additional gain to S n (t0 is applied. 
Equation. 3. S' n (t,) = S n (t,) x(l + K n x G n ) 

where t, = -3 x T, (typically t, = -30ms) 
and K n = emphasis gain modifier constant (typically K n = 2) 
30 Note that the gain coefficient (G n ) is applied to the corresponding envelope signal 
(S n (t)) at a point in time mid-way through the analysis history (typically -30ms). 
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Thus an overall delay of 3 x T (typically 30ms) occurs between time from input to 
output of the transient emphasis algorithm. The above procedure is conducted 
independently for each of the channels (that is, for n = 1 to number of channels 
(N)). It is repeated at regular time intervals such that the gain applied per channel 
varies over time. 

The modified envelope signal S' n (0 replace the original envelope signal S n (t) 
derived from the N channel filter bank 3 of the SMSP strategy. As with the SMSP 
strategy, M of the N channels of S' n (t) having the largest amplitude at a given 
instance in time are selected 14 (typically M = 6). This occurs at regular time 
intervals and for the transient emphasis strategy is typically 2.5ms. The M selected 
channels are then used to generate M electrical stimuli 15 corresponding in stimulus 
intensity and electrode number to the amplitude and frequency of the M selected 
channels (as per the SMSP strategy). These M stimuli are transmitted to the 
Cochlear implant 17 via a radio-frequency link 16 and are used to activate M 
corresponding electrode sites. 

To illustrate the effect of the TESM strategy, "electrodogram" recordings 
from the output of a speech processor used in a cochlear implant system have been 
provided for the SMSP and TESM strategies (refer to Figures 2 & 3 respectively). 
Electrograms are somewhat similar to spectrograms for acoustic signals, and show 
how the excitation on each of the intra-cochlear electrodes varies with time. Time 
is shown along the abscissa and electrode number along the ordinate. For each 
stimulus pulse recorded from the output of the processor, a vertical bar is shown in 
the electrodogram at the time and electrode position of the stimulus. The height of 
the bar represents the stimulus intensity (log current) where zero-height corresponds 
25 to the hearing threshold for electrical stimulation at the specified electrode position, 
and maximum-height corresponds to maximum comfortable loudness for electrical 
stimulation at the specified electrode position. The speech token presented in these 
recordings was *aka' and was spoken by a male speaker. 

Inspecting the electrograms it can be seen that the amplitude of the stimuli 
30 representing the plosive burst in the medial consonant 'k' have been emphasised (ie, 
temporarily increased in amplitude) by the TESM strategy 18 relative to the 
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standard SMSP strategy. So too have the 2nd formant (F2) transition from initial 
vowel 'a' to 'k' 19, and from 'k' to final vowel 'a' 20. Also worth noting is that the 
onset of the initial vowel 'a' has been slightly emphasised 21. 

Since modifications within the spirit and scope of the invention may be. 
5 readily effected by persons skilled in the art, it is to be understood that the invention 
is not limited to the particular embodiment described, by way of example, 
hereinabove. 



10 DATED: 26 October 1999 

CARTER SMITH & BEADLE 
Patent Attorneys for the Applicant: 
THE UNIVERSITY OF MELBOURNE 

15 
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